NYC Bike Violations

Introduction

Cycling is… While one can find a great deal of information about general cycling in NYC, it is harder to find information specifically about bike tickets.

Throughout this report, I will use the term ‘cyclist’ to refer to regular bike riders, ebike riders, and escooter riders and ‘bicycle’ to refer to any of the vehicles that falls under these labels.

Data

Datasets

The datasets used were all taken from the NYC Open Data website, except the violation code description data, which was taken from the NY DMV website. The violation data was merged from two different datasets, a ‘year to date’ dataset spanning Jan 1, 2023 to June 30th, 2023, and a ‘historical’ dataset dating from Jan 1, 2018 to December 31, 2022. These datasets contained all violations issued by the NYPD, but only the violations labeled ‘Bike’, ‘Ebike’, and ‘Escooter’ were chosen for this report. The bicycle count data was agglomerated per day and per week, and these values were appended to the merged violations dataset. The violation code description data was also merged with the violations dataset, so that a textual description of the violation code could easily be accessed.

Bike Counters

Bicycle counter locations throughout the 5 boroughs. Note that two on Amserdam Ave on the west side of Central Park have the exact same latitude and longitude in the .csv file, but in reality one is on Amsterdam ave collecting uptown traffic and the other is an avenue over on Columbus Ave, counting downtown traffic.

The bicycle count data were taken from 18 separate bike counters, spread throughought the 5 boroughs, as can be seen in the map above. Although a total of 29 were listed in the .csv file, some of those were either duplicates or not applicable (ie - they only counted pedestrians) and were removed from consideration. The majority of these counters could be considered ‘Manhattan-centric.’ For instance, there is only one counter in Staten Island by the ferry access to Manhattan, one in a single area in the south Bronx, and only three in Queens, with one of those placed near the Queensboro bridge to Manhattan. Based on this unequal representation, I chose not to perform any borough specific analyses involving rider count data (however, violation data was not dependent on bike counters), and any generalizations or take-aways from this report should keep these limitations in mind. The bike counters acted by counting the number of cyclists that crossed them, and recording this aggregation every 15 minutes. The bike counter data ranged from 2012 to present, but only the data from 2018 to present were used.

Removal of NA values

Of the 126,812 initial violation entries in the specified time frame, 111 total entries were removed, leaving 126,701. Sixteen rows did not contain violations codes, 92 rows did not contain city name or location information, and 3 rows did not contain location information.

EDA: Cyclist Counts

Daily Total Cyclists

Top 10 Busiest Cycling Days
Violation Date Daily Total Cyclists
2019-10-30 59750
2019-11-04 59291
2019-11-05 58329
2020-09-12 57637
2019-11-27 57456
2019-07-16 57394
2019-11-06 57384
2019-11-26 57323
2020-06-13 57195
2020-11-07 56453

The amount of total cyclists per day ranged from 1665 to almost 60,000. The busiest days tended to be in late October/November of 2019/2020, with a few in the middle of summer and a few in September. Further investigation would help uncover why this is, but my guess is that people might be trying to squeeze in one last ride in before winter.

Daily Total Cyclists over Time

From the plot of daily total cyclists a general increasing trend can be seen, as well as strong seasonality. There are more cyclists in the warmer summer months than in the colder months. The slope for the regression line is 6.537613, which indicates that there are on average approximately 6.5 new riders per day.

Seasonality

Let’s further explore the seasonality. It’s easier to view this with monthly totals:

The increase in ridership in the warmer months can clearly be seen. Also note the yearly increase, as well as the dip in April of 2020 during covid lockdown.

Busiest Bike Counters

Busiest Bike Counters in 2022

Bike counter location | Number of Cyclists counted in 2022 _________________________________________ | ___________________________________ Williamsburg side of Williamsburg bridge | 1964902 Queens side of Queensboro Bridge | 1818163 Manhattan side of Manhattan bridge | 1584788 Kent ave in Williamsburg | 1066870 Brooklyn Bridge | 1002070

The busiest bike counters in 2022 were at bridge access points as well as Kent ave in Williamsburg, which feeds onto the Wburg bridge. For 2023, up until June 30th, the order is mostly the same, with Kent ave and Brooklyn Bridge switching places. An analysis of daily total count data from just the Williamsburg bridge counter revealed similar trends to the total count data from all counters.

EDA: Violations

Violations over time

There were 126,701 total violations consisting of 188 different types of violations handed out to cyclists during the time frame this data was collected. As we can see from the daily violations plot, there was a sharp drop-off in violations handed out after covid lockdown.

Violations per borough

Per borough, we can see the Manhattan had the most violations, by far, with Brooklyn coming in second. More information is required as to determine why. It’s possible that the police presence in Manhattan is higher. It is also possible that there are far more cyclists in Manhattan than in the other boroughs, although this can not be determined from the current data due to the dispoportional bike counter placement.

Most Common Violations

Top 10 Violations
Violation Code Description Total Percent
1111D1C BICYCLE OR SKATEBOARD FAILED TO STOP AT RED LIGHT- NYC 55933 44.1
1110AB DISOBEYED TRAFFIC DEVICE WHILE OPERATING BICYCLE 17166 13.5
1127AB DRIVING WRONG DIRECTION ON ONE-WAY STREET - BICYCLE 7995 6.3
403A3IX BICYCLE FAILED TO YIELD TO VEHICLE/PEDESTRIAN AT RED LIGHT- NYC 6682 5.3
37524AB OPER BICYCLE WITH MORE 1 EARPHONE 6052 4.8
1236B NO BELL OR SIGNAL DEVICE ON BICYCLE 3959 3.1
412P1 BIKING OFF LANE- NYC 3673 2.9
407C31 BIKE/SKATE ON SIDEWALK-NYC 3459 2.7
1232A IMPROPER OPERATION OF BICYCLE 2449 1.9
1111D1N NYC REDLIGHT 1925 1.5

From this table, we can see that improper traffic behavior at red lights (1111D1C, 403A3IX), account for ~50% of all bicycle violations. Other common violations include driving in the wrong direction, operating a bicycle with more than 1 earphone, not having a bell, and riding on the sidewalk.

Just looking at one years worth of the most common violation (1111D1C, since June 2022), we can see a few cluster areas emerge: * Upper West and Upper East sides of Manhattan (and their corresponding avenues up through Harlem), * Up and down 1st and 2nd aves, especially in the East Village, * On the Brooklyn side of the Williamsburg Bridge, * Up and down 4th and 5th aves in Brooklyn, * Liberty Ave in Ozone park/ Richmond Hill, * The East side of Prospect Park on Bedford Ave * Bensonhurst, Brooklyn.

If you are riding through these areas, please be extra careful about stopping at stop signs and red lights, as these areas are where more bike tickets are administered to cyclists.

Notable Violation

This specific violation - Code 12332, ATTACHING SELF TO MOVING MOTOR VEHICLE which occurred only once on May 1 2018 at 15:07:17 - is highly notable, as it is direct proof that at least one famous time traveller graced us with his presence near West 57th st and 9th Ave:

Busiest Violation Days

Top 10 busiest violation days
Violation Date Daily Total Violations
2019-08-15 324
2019-08-09 312
2019-10-02 308
2018-07-31 306
2018-06-19 302
2018-05-03 297
2019-10-01 291
2019-04-25 290
2018-08-29 284
2018-04-12 275

The busiest violation days were mostly on summer days in 2018 and 2019. This is consistent with our data that shows that there was a sharp dropoff in violations after covid lockdown, and that more ridership and violations are handed out in the summer months than in the colder months.

Forecasting: Rider Counts

Since the seasonality for rider counts is yearly, I used monthly total ridership instead of daily total ridership for modeling. Daily total ridership proved to be too noisy.

The training set was from the beginning of the data set, Jan 2018, to May 2022. The test set contained the data from June 2022 to June 2023.

Five different models for monthly bike counts were attempted: A seasonal naive forecast, three Exponential smoothing models (ETS AAA, ETS AAdA, and the ETS function’s choice (ETS MNM)) as well as the ARIMA function’s choice (ARIMA (1,0,0)(1,1,0)[12]).

Since there was strong seasonality, I decided to focus on exponential smoothing models. An ARIMA and seasonal naive forecast were modeled for comparison. When allowing the ETS function to choose what it thought would be the best model, it chose a multiplicative ETS model (MNM). I do not know why the function chose this. This highlights the importance of understanding analysis techniques when choosing the best model, and not leaving the decision up to an algorithm.

Forecasts
Forecasting Method RMSE on Test Set
Seasonal Naive 139936.1
ETS_AAA 86248.05
ETS_AAdA 94780.86
ETS_MNM 116723.8
ARIMA (1,0,0)(1,1,0)[12] 114677.4

From the table, we can see that ETS_AAA fit the test dataset the best. Originally, I expected a model with damping to outperform one without damping as there was a slight trend.

Model Diagnostics

The residuals for this model do not show anything out of the ordinary. Additionally, it fails to reject the null for the box pierce (p = 0.3329984) and ljung-box (p = 0.1743495) tests, indicating that the residuals are white noise. The residuals also fail to reject the Shapiro-Wilks test (p = 0.1783), indicating that the they are normally distributed.

Lastly, here is a plot utilizing our ETS_AAA model to predict ridership into 2025, with 80% and 95% errors being shown:

# Important Takeaways